{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "dec14640",
   "metadata": {},
   "source": [
    "# PyTorch 基础 : 神经网络包nn和优化器optm\n",
    "torch.nn是专门为神经网络设计的模块化接口。nn构建于 Autograd之上，可用来定义和运行神经网络。 这里我们主要介绍几个一些常用的类\n",
    "\n",
    "**约定：torch.nn 我们为了方便使用，会为他设置别名为nn，本章除nn以外还有其他的命名约定**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "08604e08",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'2.0.1'"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 首先要引入相关的包\n",
    "import torch\n",
    "# 引入torch.nn并指定别名\n",
    "import torch.nn as nn\n",
    "#打印一下版本\n",
    "torch.__version__"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "76826001",
   "metadata": {},
   "source": [
    "**torch.nn 中实现的都是一个个的类，是用class xx()定义的，而 nn.functional中的函数，就是是纯函数，由def xx( )定义。**"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ed8a32e9",
   "metadata": {},
   "source": [
    "除了nn别名以外，我们还引用了nn.functional，这个包中包含了神经网络中使用的一些常用函数，这些函数的特点是，不具有可学习的参数(如ReLU，pool，DropOut等)，这些函数可以放在构造函数中，也可以不放，但是这里建议不放。\n",
    "\n",
    "一般情况下我们会将nn.functional 设置为大写的F，这样缩写方便调用"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "21e05810",
   "metadata": {},
   "outputs": [],
   "source": [
    "import torch.nn.functional as F"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dfa00a7c",
   "metadata": {},
   "source": [
    "# 定义一个网络\n",
    "PyTorch中已经为我们准备好了现成的网络模型，只要继承nn.Module，并实现它的forward方法，PyTorch会根据autograd，自动实现backward函数，在forward函数中可使用任何tensor支持的函数，还可以使用if、for循环、print、log等Python语法，写法和标准的Python写法一致。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7ded4c96",
   "metadata": {},
   "source": [
    "> 在PyTorch中，当你调用一个继承自nn.Module的类的实例时，PyTorch会自动调用这个类的forward方法。这是因为forward方法定义了模型的前向传播逻辑，它告诉PyTorch如何将输入数据传递通过网络的各个层，最终得到输出。\n",
    "\n",
    "当你执行以下代码时：\n",
    "\n",
    "```python\n",
    "out = net(input)\n",
    "```\n",
    "实际上是在调用了net对象的forward方法，将输入input传递给模型，然后PyTorch会按照forward方法中定义的顺序执行卷积、激活、池化等操作，最终生成输出张量out。这个过程是PyTorch的自动求导机制的一部分，它用于构建计算图以支持梯度计算和反向传播。\n",
    "\n",
    "因此，你不需要手动调用forward方法，PyTorch会在你使用模型对象时自动执行它，这样可以使代码更简洁和易于理解。forward方法的定义是实现前向传播逻辑的关键，你只需要关注定义好的模型结构和输入数据，PyTorch会负责执行前向传播。\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "a5e8ca31",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Net(\n",
      "  (conv1): Conv2d(1, 6, kernel_size=(3, 3), stride=(1, 1))\n",
      "  (fc1): Linear(in_features=1350, out_features=10, bias=True)\n",
      ")\n"
     ]
    }
   ],
   "source": [
    "class Net(nn.Module):\n",
    "    def __init__(self):\n",
    "        # nn.Module子类的函数必须在构造函数中执行父类的构造函数\n",
    "        super(Net, self).__init__()\n",
    "        \n",
    "        # 卷积层 '1'表示输入图片为单通道， '6'表示输出通道数，'3'表示卷积核为3*3\n",
    "        self.conv1 = nn.Conv2d(1, 6, 3) \n",
    "        #线性层，输入1350个特征，输出10个特征\n",
    "        self.fc1 = nn.Linear(1350, 10)  #这里的1350是如何计算的呢？这就要看后面的forward函数\n",
    "        \n",
    "    #正向传播 \n",
    "    def forward(self, x): \n",
    "        print(x.size()) # 结果：[1, 1, 32, 32]\n",
    "        # 卷积 -> 激活 -> 池化 \n",
    "        x = self.conv1(x) #根据卷积的尺寸计算公式，计算结果是30，具体计算公式后面第二章第四节 卷积神经网络 有详细介绍。\n",
    "        x = F.relu(x)\n",
    "        print(x.size()) # 结果：[1, 6, 30, 30]\n",
    "        x = F.max_pool2d(x, (2, 2)) #我们使用池化层，计算结果是15\n",
    "        x = F.relu(x)\n",
    "        print(x.size()) # 结果：[1, 6, 15, 15]\n",
    "        # reshape，‘-1’表示自适应\n",
    "        #这里做的就是压扁的操作 就是把后面的[1, 6, 15, 15]压扁，变为 [1, 1350]\n",
    "        x = x.view(x.size()[0], -1) \n",
    "        print(x.size()) # 这里就是fc1层的的输入1350 \n",
    "        x = self.fc1(x)        \n",
    "        return x\n",
    "\n",
    "net = Net()\n",
    "print(net)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8e62d693",
   "metadata": {},
   "source": [
    "网络的可学习参数通过net.parameters()返回"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "7ce18623",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Parameter containing:\n",
      "tensor([[[[-0.3049, -0.0227,  0.1548],\n",
      "          [-0.0409,  0.0277,  0.1657],\n",
      "          [ 0.0827, -0.3312, -0.1646]]],\n",
      "\n",
      "\n",
      "        [[[ 0.2196,  0.2361,  0.0933],\n",
      "          [ 0.0680, -0.0647, -0.0256],\n",
      "          [-0.1080,  0.1228,  0.1514]]],\n",
      "\n",
      "\n",
      "        [[[-0.3097,  0.1088, -0.1589],\n",
      "          [ 0.2850,  0.1270, -0.2313],\n",
      "          [ 0.3035, -0.1841,  0.0189]]],\n",
      "\n",
      "\n",
      "        [[[ 0.1301,  0.1790,  0.0189],\n",
      "          [-0.1662,  0.1049, -0.2030],\n",
      "          [-0.1473, -0.0201,  0.2222]]],\n",
      "\n",
      "\n",
      "        [[[ 0.0842,  0.0998, -0.1249],\n",
      "          [-0.2695, -0.0016, -0.3258],\n",
      "          [ 0.1645, -0.3051, -0.0229]]],\n",
      "\n",
      "\n",
      "        [[[ 0.2736, -0.2406,  0.1381],\n",
      "          [ 0.1751, -0.0305,  0.0326],\n",
      "          [ 0.1427, -0.0461,  0.3092]]]], requires_grad=True)\n",
      "Parameter containing:\n",
      "tensor([-0.2591,  0.2172,  0.2032,  0.1473,  0.2878, -0.0726],\n",
      "       requires_grad=True)\n",
      "Parameter containing:\n",
      "tensor([[-0.0198,  0.0206,  0.0035,  ...,  0.0152,  0.0067, -0.0199],\n",
      "        [-0.0160, -0.0019,  0.0137,  ..., -0.0126,  0.0237, -0.0261],\n",
      "        [-0.0254, -0.0189,  0.0118,  ...,  0.0135,  0.0044,  0.0181],\n",
      "        ...,\n",
      "        [ 0.0017, -0.0271, -0.0048,  ..., -0.0240, -0.0102,  0.0076],\n",
      "        [ 0.0060,  0.0056, -0.0098,  ..., -0.0272, -0.0082, -0.0052],\n",
      "        [-0.0179,  0.0061, -0.0160,  ..., -0.0120, -0.0147, -0.0067]],\n",
      "       requires_grad=True)\n",
      "Parameter containing:\n",
      "tensor([-0.0239,  0.0235, -0.0257, -0.0246, -0.0070, -0.0118, -0.0200, -0.0053,\n",
      "         0.0061, -0.0005], requires_grad=True)\n"
     ]
    }
   ],
   "source": [
    "for parameters in net.parameters():\n",
    "    print(parameters)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "150b6653",
   "metadata": {},
   "source": [
    "net.named_parameters可同时返回可学习的参数及名称。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "ecd2ea0d",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "conv1.weight : torch.Size([6, 1, 3, 3])\n",
      "conv1.bias : torch.Size([6])\n",
      "fc1.weight : torch.Size([10, 1350])\n",
      "fc1.bias : torch.Size([10])\n"
     ]
    }
   ],
   "source": [
    "for name,parameters in net.named_parameters():\n",
    "    print(name,':',parameters.size())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b8a32887",
   "metadata": {},
   "source": [
    "forward函数的输入和输出都是Tensor"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "a11a4a84",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "torch.Size([1, 1, 32, 32])\n",
      "torch.Size([1, 6, 30, 30])\n",
      "torch.Size([1, 6, 15, 15])\n",
      "torch.Size([1, 1350])\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "torch.Size([1, 10])"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "input = torch.randn(1, 1, 32, 32) # 这里的对应前面fforward的输入是32\n",
    "out = net(input)\n",
    "out.size()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "c08e099c",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "torch.Size([1, 1, 32, 32])"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "input.size()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "62c31ac9",
   "metadata": {},
   "source": [
    "在反向传播前，先要将所有参数的梯度清零"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "25943203",
   "metadata": {},
   "outputs": [],
   "source": [
    "net.zero_grad() \n",
    "out.backward(torch.ones(1,10)) # 反向传播的实现是PyTorch自动实现的，我们只要调用这个函数即可"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9b06a241",
   "metadata": {},
   "source": [
    "注意:torch.nn只支持mini-batches，不支持一次只输入一个样本，即一次必须是一个batch。\n",
    "\n",
    "也就是说，就算我们输入一个样本，也会对样本进行分批，所以，所有的输入都会增加一个维度，我们对比下刚才的input，nn中定义为3维，但是我们人工创建时多增加了一个维度，变为了4维，最前面的1即为batch-size\n",
    "\n",
    "损失函数\n",
    "在nn中PyTorch还预制了常用的损失函数，下面我们用MSELoss用来计算均方误差"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "af3752c3",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "28.05804443359375\n"
     ]
    }
   ],
   "source": [
    "y = torch.arange(0,10).view(1,10).float()\n",
    "criterion = nn.MSELoss()\n",
    "loss = criterion(out, y)\n",
    "#loss是个scalar，我们可以直接用item获取到他的python类型的数值\n",
    "print(loss.item()) "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "673f7999",
   "metadata": {},
   "source": [
    "# 优化器\n",
    "在反向传播计算完所有参数的梯度后，还需要使用优化方法来更新网络的权重和参数，例如随机梯度下降法(SGD)的更新策略如下：\n",
    "\n",
    "weight = weight - learning_rate * gradient\n",
    "\n",
    "在torch.optim中实现大多数的优化方法，例如RMSProp、Adam、SGD等，下面我们使用SGD做个简单的样例"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "3adc0085",
   "metadata": {},
   "outputs": [],
   "source": [
    "import torch.optim"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "39d921f9",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "torch.Size([1, 1, 32, 32])\n",
      "torch.Size([1, 6, 30, 30])\n",
      "torch.Size([1, 6, 15, 15])\n",
      "torch.Size([1, 1350])\n"
     ]
    }
   ],
   "source": [
    "out = net(input) # 这里调用的时候会打印出我们在forword函数中打印的x的大小\n",
    "criterion = nn.MSELoss()\n",
    "loss = criterion(out, y)\n",
    "#新建一个优化器，SGD只需要要调整的参数和学习率\n",
    "optimizer = torch.optim.SGD(net.parameters(), lr = 0.01)\n",
    "# 先梯度清零(与net.zero_grad()效果一样)\n",
    "optimizer.zero_grad() \n",
    "loss.backward()\n",
    "\n",
    "#更新参数\n",
    "optimizer.step()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "d2l",
   "language": "python",
   "name": "d2l"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.17"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
